Similarity Search in Arbitrary Subspaces Under Lp-Norm

نویسندگان

  • Xiang Lian
  • Lei Chen
چکیده

Similarity search has been widely used in many applications such as information retrieval, image data analysis, and time-series matching. Specifically, a similarity query retrieves all data objects in a data set that are similar to a given query object. Previous work on similarity search usually consider the search problem in the full space. In this paper, however, we propose a novel problem, subspace similarity search, which finds all data objects that match with a query object in the subspace instead of the original full space. In particular, the query object can specify arbitrary subspace with arbitrary number of dimensions. Since traditional approaches for similarity search cannot be applied to solve the proposed problem, we introduce an efficient and effective pruning technique, which assigns scores to data objects with respect to pivots and prunes candidates via scores. We propose an effective multipivot-based method to pre-process data objects by selecting appropriate pivots, where the entire procedure is guided by a formal cost model, such that the pruning power is maximized. Finally, scores of each data object are organized in sorted list to facilitate an efficient subspace similarity search. Extensive experiments have verified the correctness of our cost model and demonstrated the efficiency and effectiveness of our proposed approach for the subspace similarity search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

mm-GNAT: index structure for arbitrary Lp norm

For fast ε-similarity search, various index structures have been proposed. Yi et al. proposed a concept multimodality support and suggested inequalities by which εsimilarity search by L1, L2 and L∞ norm can be realized. We proposed an extended inequality which allows us to realize ε-similarity search by arbitrary Lp norm using an index based on Lq norm. In these investigations a search radius o...

متن کامل

Fast Time Sequence Indexing for Arbitrary Lp Norms

Fast indexing in time sequence databases for similarity searching has attracted a lot of research recently. Most of the proposals, however, typically centered around the Euclidean distance and its derivatives. We examine the problem of multimodal similarity search in which users can choose the best one from multiple similarity models for their needs. In this paper, we present a novel and fast i...

متن کامل

Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces

There are abundant scenarios for applications of similarity search in databases where the similarity of objects is defined for a subset of attributes, i.e., in a subspace, only. While much research has been done in efficient support of single column similarity queries or of similarity queries in the full space, scarcely any support of similarity search in subspaces has been provided so far. The...

متن کامل

On Projection Based Operators in lp Space for Exact Similarity Search

We investigate exact indexing for high dimensional lp norms based on the 1-Lipschitz property and projection operators. The orthogonal projection that satisfies the 1-Lipschitz property for the lp norm is described. The adaptive projection defined by the first principal component is introduced.

متن کامل

Fast Time Sequence Indexing for Arbitrary Lp NormsByoung - Kee

Fast indexing in time sequence databases for similarity searching has attracted a lot of research recently. Most of the proposals, however, typically centered around the Euclidean distance and its derivatives. We examine the problem of multi-modal similarity search in which users can choose the best one from multiple similarity models for their needs. In this paper, we present a novel and fast ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008